## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Data and Sources

As the number of COVID-19 cases soars to unprecedented heights around the United States, public health experts and many political figures continue to emphasize mask wearing as one of the most effective ways to slow the spread of the pandemic. But, as a New York Times survey from July 2020 shows, mask wearing adherence varies widely in counties around the nation. What predictors might explain this variation in mask wearing, and how might public health officials use this information to develop more effective mask-wearing interventions? To what extent can mask wearing predict the spread of the virus on a county level?

To address these questions, we plan to create two models: one to predict mask-wearing adherence by county based on a variety of county and state-wide predictors, and one to predict the spread of coronavirus in a county based on mask-wearing. Our data about mask-wearing (which is our outcome in the first model and a predictor in the second model) is from the aforementioned New York Times survey, which was conducted by the survey firm Dynata on behalf of the Times from July 2 to July 14. Aggregated at the county level, it sorts 250,000 individual responses into 3,000 U.S. counties (suggesting that a mixed effects model will likely be a useful approach). The survey asked respondents how often they wore a mask (choices were always, frequently, sometimes, rarely, or never) and presents the percentage of people who gave each answer for every county, which we combined into a single weighted average representing the probability that a randomly selected person is wearing a mask in the county.

Our predictor variables were compiled from a variety of sources and joined with the mask wearing data by county FIPS code. We included gender, political party, education, and age statistics at the county level as all of these demographics have shown to differ in mask wearing frequency in prior surveys, with political party being especially significant. Other data, such as this poll from the Pew Research Center have suggested that mask wearing varies by race: this, combined with the fact that the pandemic has disproportionately impacted communities of color according to the CDC motivated us to include variables about the racial composition of counties in our baseline model. Researchers at the National Institute of Health have suggested that age and location (i.e. rural vs. urban setting) likewise affect mask wearing behavior, so we included the percentage of seniors in a county (since COVID-19 most severely affects the elderly) and various measures of population density in our mask-wearing model. Finally, we wanted to look beyond county demographics and determine whether coronavirus-related measures, including number of cases/deaths, growth rate of the virus at the time of the survey, and local/statewide mask mandates explained any of the variation in mask wearing by county.

For a complete list of the variables in our clean and compiled dataset and their sources, see the table below.

Variable Names and Descriptions
Name Description Source Source URL
countyfp County level FIPS (Federal Information Processing System) code. Unique for each American county. New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
county_name Name of the county NA NA
state state the county is located in NA NA
pct_mask An aggregate variable representing the probability that a randomly selected person in a county will wear a mask. Calculated by 1*(always) + 0.75*(frequently)+0.5*(sometimes)+0.25*(rarely)+0*(never) NA NA
always Percentage of people who answered they "always" wear a mask New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
frequently Percentage of peoplewho answered they frequently wear a mask New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
sometimes Percentage of people who answered they sometimes wear a mask New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
rarely Percentage of peoplewho answered they rarely wear a mask New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
never Percentage of people who answered they never wear a mask New York Times https://github.com/nytimes/covid-19-data/blob/master/mask-use/mask-use-by-county.csv
cases_02 Number of COVID-19 cases on 07/02/2020 New York Times https://github.com/nytimes/covid-19-data
deaths_02 Number of COVID-19 deaths on 07/02/2020 NA https://github.com/nytimes/covid-19-data
cases_14 Number of COVID-19 cases on 07/14/2020 New York Times https://github.com/nytimes/covid-19-data
deaths_14 Number of COVID-19 deaths on 07/14/2020 NA https://github.com/nytimes/covid-19-data
cases_27 Number of COVID-19 cases on 07/27/2020 New York Times https://github.com/nytimes/covid-19-data
deaths_27 Number of COVID-19 deaths on 07/27/2020 NA https://github.com/nytimes/covid-19-data
case_growth_1 cases_14/cases_02 NA NA
case_growth_2 cases_27/cases_14 NA NA
pop_2019 Population estimate in 2019 United States Census Bureau https://www.census.gov/newsroom/press-kits/2019/national-state-estimates.html
ru_continuum 1 to 10 rating on the Rural-Urban Continuum United States Census Bureau https://www.census.gov/newsroom/press-kits/2019/national-state-estimates.html
density Population density of the county county_level_election.csv from class NA
pct_less_than_hs Percent of adults with less than a high school diploma, 2014-18 2014-18 American Community Survey https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
pct_hs Percent of adults with a high school diploma only, 2014-18 2014-18 American Community Survey https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
pct_some_college Percent of adults completing some college or associate's degree, 2014-18 2014-18 American Community Survey https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
pct_college Percent of adults with a bachelor's degree or higher, 2014-18 2014-18 American Community Survey https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
pct_poverty Percentage of people estimated to be living in poverty in 2018 U.S. Census Bureau, Small Area Income and Poverty Estimates (SAIPE) Program https://www.ers.usda.gov/data-products/county-level-data-sets/download-data/
pct_female Percentage of females in county, 2019 U.S. Census Bureau https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
pct_black Percentage of Black/African-American residents in county, 2019 U.S. Census Bureau https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
pct_native Percentage of American Indian or Alaskan Native people in county, 2019 U.S. Census Bureau https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
pct_hispanic Percentage of Hispanic people in county, 2019 U.S. Census Bureau https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
pct_seniors Percentage of adults 65 or over in county, 2019 U.S. Census Bureau https://www.census.gov/newsroom/press-kits/2020/population-estimates-detailed.html
pct_trump_2016 Percentage of county who voted for Donald Trump in 2016 county_level_election.csv from class NA
pct_trump_2020 Percentage of county who voted for Donald Trump in 2020 Scraped by GitHub user tonmcg from Fox News, Politico, and New York Times https://github.com/tonmcg/US_County_Level_Election_Results_08-20
dem_governor Dummy variable coded 1 if the state has a Democratic governor National Governor's Association https://www.nga.org/wp-content/uploads/2019/07/Governors-Roster.pdf
state_mandate Dummy variable coded 1 if a statewide mask mandate was enacted before 07/14/2020 Axios https://www.axios.com/states-face-coverings-mandatory-a0e2fe35-5b7b-458e-9d28-3f6cdb1032fb.html
county_mandate Dummy variable coded 1 if there was a county-wide mask mandate enacted before -7/14/2020 Harris Institute of Public Policy https://www.austinlwright.com/covid-research

Exploratory Data Analysis

First, we wanted to make sure that our response variable pct_mask is distributed approximately normally. Based on this following histogram, the dis

Other variables that needed to be log transformed were pct_seniors, pct_poverty, and all individual race/ethnicity categories. pct_hs did not need to be log transformed, and pct_female looked skewed both with and without a transformation, so we left it untransformed.

We also have some missing variables in our dataset which we will have to figure out how to impute.

sapply(clean_data_2, function(x) sum(is.na(x)))
##         countyfp      county_name            state         pct_mask 
##                0               30                0                0 
##           always       frequently        sometimes           rarely 
##                0                0                0                0 
##            never         cases_02        deaths_02         cases_14 
##                0               97               97               59 
##        deaths_14         cases_27        deaths_27    case_growth_1 
##               59               42               42               97 
##    case_growth_2         pop_2019     ru_continuum          density 
##               59                0                0                3 
## pct_less_than_hs           pct_hs pct_some_college      pct_college 
##                0                0                0                0 
##      pct_poverty       pct_female        pct_black       pct_native 
##                1                0                0                0 
##     pct_hispanic        pct_asian      pct_seniors   pct_trump_2016 
##                0                0                0               30 
##   pct_trump_2020     dem_governor    state_mandate   county_mandate 
##               32                0                0               10

Baseline Models

In order to run the linear model, we had to change some of the transformations to log(1+X) so avoid taking the log of 0. Specifically, we had to do this for all 4 race/ethnicity variables and pct_college. We left out two of the education categories to pct_less_than_hs and pct_some_college to avoid multicolinearity, but moving forward, it might be best to create two a variables that sums pct_college and pct_some_college. We might consider doing the same thing with minority groups. Finally, we removed pct_trump2020 from the model because the multicolinearity between that and the 2016 percent was inflating the standard errors. remove pop2019

# interceptmodel = lm(pct_mask ~ 1, data = clean_data_2)
# 
# fullmodel = lm(pct_mask ~ ru_continuum + log(density) + pct_hs + log(1+pct_college) +
#                log(pct_poverty) + pct_female + log(1+pct_black) + log(1+pct_native) + log(1+pct_hispanic) +
#                log(1+pct_asian) + log(pct_seniors) + log(100-pct_trump_2016) + dem_governor +
#                state_mandate + county_mandate,
#                data = clean_data_2)
# 
# summary(fullmodel)
# 
# interactionmodel = lm(pct_mask ~ (ru_continuum + log(density) + pct_hs + log(1+pct_college) +
#                log(pct_poverty) + pct_female + log(1+pct_black) + log(1+pct_native) + log(1+pct_hispanic) +
#                log(1+pct_asian) + log(pct_seniors) + log(100-pct_trump_2016) + dem_governor +
#                state_mandate + county_mandate)^2,
#                data = clean_data_2)
# 
# selected_model = step(fullmodel, scope = list(lower = formula(interceptmodel), upper = formula(interactionmodel)), 
#      direction = "both", trace = 0)
# 
# summary(selected_model)